An Alignment-Free Distance Measure for Closely Related Genomes
نویسندگان
چکیده
Phylogeny reconstruction on a genome scale remains computationally challenging even for closely related organisms. Here we propose an alignmentfree pairwise distance measure, Kr, for genomes separated by less than approximately 0.5 mismatches/nucleotide. We have implemented the computation of Kr based on enhanced suffix arrays in the program kr, which is freely available from guanine.evolbio.mpg.de/kr/. The software is applied to genomes obtained from three sets of taxa: 27 primate mitochondria, eight Staphylococcus agalactiae strains, and 12 Drosophila species. Subsequent clustering of the Kr values always recovers phylogenies that are similar or identical to the accepted branching order.
منابع مشابه
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances
Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cann...
متن کاملAlignment-Free Genome Tree Inference by Learning Group-Specific Distance Metrics
Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two funda...
متن کاملGenomic Classification Using an Information-Based Similarity Index: Application to the SARS Coronavirus
Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity...
متن کاملAccurately Measuring Recombination between Closely Related HIV-1 Genomes
Retroviral recombination is thought to play an important role in the generation of immune escape and multiple drug resistance by shuffling pre-existing mutations in the viral population. Current estimates of HIV-1 recombination rates are derived from measurements within reporter gene sequences or genetically divergent HIV sequences. These measurements do not mimic the recombination occurring in...
متن کاملEstimating Mutation Distances from Unaligned Genomes
Abstract Alignment-free distance measures are generally less accurate but more efficient than traditional alignment-based metrics. In the context of genome sequence analysis, the efficiency gain is often so substantial that it outweights the loss in accuracy. However, a further disadvantage of alignment-free distances is that their relationship to evolutionary events such as substitutions is ge...
متن کامل